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ROBUST  CODING  FOR  MULTIPLE-ACCESS  CHANNELS 


Evaggelos  Geraniotis,  Member  IEEE 

Electrical  Engineering  Department 
and  Systems  Research  Center 
University  of  Maryland 
College  Park,  MD  20742 

ABSTRACT 

The  problem  of  minimax  robust  coding  for  classes  of  multiple-access  channels  with 
uncertainty  in  their  statistical  description  is  addressed.  We  consider:  (i)  discrete 
memoryless  multiple-access  channels  with  uncertainty  in  the  probability  transition 
matrices  and  (ii)  discrete-time  stationary  additive  Gaussian  multiple-access  channels  with 
spectral  uncertainty.  The  uncertainty  is  modeled  using  classes  determined  by  2-  alter¬ 
nating  Choquet  capacities.  Both  block  codes  and  tree  codes  are  considered.  A  robust 
maximum-likelihood  decoding  rule  is  derived  which  guarantees  that,  for  all  two-user 
channels  in  the  uncertainty  class  and  all  pairs  of  code  rates  in  a  critical  rate  region,  the 
average  probability  of  decoding  error  for  the  ensemble  of  pairs  of  random  block  codes 
and  the  ensemble  of  pairs  of  random  tree  codes  converges  to  zero  exponentially  with 
increasing  block  length  or  constraint  length,  respectively.  The  channel  capacity  and 
cut-off  rate  regions  of  the  class  are  then  evaluated. 
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by  the  Systems  Research  Center  at  the  University  of  Maryland,  College  Park,  through  National  Science  Foundation 
CDR-85- 00108. 


I.  INTRODUCTION 


For  two-user  discrete  memoryless  multiple-access  channels  whose 
statistical  description  (i.e.,  the  probability  transition  matrix  which 
determines  the  channel)  is  known  the  coding  theorems  of  [1]  -  [3]  guarantee 
that,  if  the  pair  of  coding  rates  lies  in  a  critical  region  (termed 
achievable  rate  region) ,  there  exists  a  pair  of  block  codes  such  that  the 
error  probability  of  the  decoder  approaches  zero  exponentially  with 
increasing  block  length.  Similar  results  for  two-user  tree  codes  were 
established  in  [4], 

For  channels  whose  statistical  description  is  not  perfectly  known  but 
the  determining  quantity  (e.g.,  the  transition  probability  matrix)  belongs 
to  a  class,  the  achievable  region  was  derived  in  [5]  for  arbitrarily  varying 
MAC'S.  In  [6]  a  universal  coding  approach  was  applied  to  discrete- 
memoryless  MAC'S.  According  to  this  approach  a  finite  number  of 
representative  channels  exists  so  that,  if  we  code  for  these  channels,  all 
the  other  channels  in  the  class  have  asymptotically  optimal  coded 
performance.  Two  possible  disadvantages  are:  (i)  a  large  numer  of 
representative  channels  may  be  necessary  and  (ii)  the  construction  of  the 
repr esentative  channels  for  a  given  class  can  be  very  complicated. 

In  [7]  another  method  of  universal  coding  which  does  not  use  the  notion 
of  representative  channels  was  introduced.  According  to  it  a  "packing 
lemma"  investigates  positions  of  codewords  independently  of  the  channel  and  is  used  t 
upperbound  the  decoding  error.  The  decoding  rule  employed  is  termed 
"maximum  mutual  information  decoding"  and  is  equally  independent  of  the 


channel  statistics. 
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Here  we  consider  another  approach  termed  minimax  robust  coding  which  is 
based  on  a  worst-case  design.  The  least-favorable  channel  is  singled  out 
and  we  use  its  probability  transition  matrix  for  maximum-likelihood 
decoding.  Then  the  probability  of  error  for  the  ensemble  of  two-user  random 
block  codes  approaches  zero  exponentially  with  increasing  block  length  for 
all  channels  in  the  class.  The  disadvantage  is  that  the  asymptotic 
performance  for  all  but  the  least- favorable  channel  in  the  class  is  not 
optimal.  However,  this  approach  requires  only  one  representative  channel 
for  the  class  (the  least-favorable  one)  which  can  be  explicitly  found  in 
several  interesting  cases.  For  single  user  channels  this  approach  was  first 
considered  in  [8]  and  for  specific  uncertainty  classes  in  [9],  the  companion 
to  this  paper.  By  restricting  attention  to  specific  uncertainty  classes  of 
channels  we  can  obtain  an  explicit  characterization  of  the  capacity  region 
and  of  the  maximum-likelihood  decoding  rule  which  will  ensure  the  asymptotic 
convergence  of  the  probability  of  decoding  error  to  zero  for  all  channnels 
in  the  class.  Therefore  this  paper  is  to  multiple  access  channels  as  the 
work  of  [9]  is  to  ordinary  Shannon  channel.  In  contrast  the  more  general 
(and  thus  less  explicit)  characterization  of  capacity  regions  in  [5]  is  to 
multiple  access  channels  as  the  compound  channel  work  of  [10]  is  to  the 
ordinary  Shannon  channel. 

In  this  paper  we  apply  the  minimax  robust  coding  approach  for  block  and 
tree  codes  to  two-user  discrete-memoryless  (DM)  MAC’S  and  discrete-time 
stationary  additive  Gaussian  (SG)  MAC'S  which  belong  to  uncertainty  classes 
determined  by  2-alternat ing  Choquet  capacities  [11].  Our  choice  of  these 
uncertainty  models  is  justified  in  two  ways.  First,  important  uncertainty 
models  like  contaminated  mixtures  [12],  total -variation  neighborhoods  [12], 
band  models  [13]  and  extended  p-point  models  [14]  are  capacity  classes  and 
have  played  an  important  role  in  hypothesis  testing  [15]  and 
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filtering  [ 1 6] .  Second,  the  least-favorable  channels  can  be  explicitly 
found  for  the  uncertainty  classes  described  by  any  of  the  above  models. 

Although  in  this  paper  we  restrict  attention  to  DM-MAC's  and  discrete-time 
stationary  Gaussian  channels  (SGC's)  (continuous-time  SGC's  are  also 
discussed),  our  results  can  be  extended  to  other  classes  of  MAC's;  e.g., 
first-order  Markov  MAC'S.  As  it  is  common  in  multi-user  information  theory 
the  results  are  established  for  two-user  MAC's,  the  extension  to  the  multi¬ 
user  case  is  then  quite  straightforward. 

The  paper  is  organized  as  follows.  Minimax  robust  coding  for  discrete- 
memoryless  MAC's  with  uncertainty  in  the  probability  transition  matrices  is 
discussed  in  Section  II  and  minimax  robust  coding  for  discrete-time 
stationary  Gaussian  MAC's  with  uncertainty  in  the  spectral  density  of  the 
additive  Gaussian  noise  is  discussed  in  Section  III.  In  each  of  these 
Sections  we  first  formulate  the  problem  and  introduce  the  necessary  concepts 
and  notation.  Next,  we  present  channel  coding  theorems  for  both  block  codes 
and  tree  codes  for  the  case  of  mismatch,  i.e.,  when  the  decoder  employs  a 
maximum-likelihood  rule  which  is  based  on  inaccurate  knowledge  of  the 
channel  statistics.  Finally,  we  derive  minimax  robust  coding  theorems  for 
the  ensemble  of  two-user  random  block  codes  and  the  ensemble  of  two-user 
random  tree  codes  and  channels  with  statistical  uncertainty  determined  by  Choquet 
capacities  and  evaluate  the  channel  capacity  region  and  the  cut-off  rate 
(actually  the  general  error  exponent)  region  for  the  class  of  channels. 

Then,  in  Section  IV  a  brief  summary  of  this  paper  and  some  conclusions  are 


presented . 
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II.  ROBUST  CODING  FOR  DISCRETE  MEMORYLESS  MULTIPLE-ACCESS  CHANNELS 
A.  Channel  Uncertainty  Determined  by  2-alternating  Capacities 
Suppose  that  for  a  two-user  channel  X^  and  are  the  input  alphabets, 

Y  is  the  output  alphabet,  and  F  =  a(Y)  is  the  a-algebra  generated  of  subsets 
of  Y.  A  discrete  memoryless  two-user  MAC  is  characterized  by  its  transition 
probability  matrix  pty^.x^),  x1  e  X^  ,  e  X^,  y  e  Y.  For  each 

-  =  ^X1’X2^  e  x  ^2  consic*er  the  conditional  probability  measure 

PX(A)  =  J  dP ( y | x ^ .x^)  where  A  e  F.  Let  pCyJx^.x^)  denote  the  Radon-Nikodym 

derivative  of  Px  with  respect  to  a  measure  X.  The  reference  measure  X 

is  chosen  according  to  the  particular  case  of  interest.  Thus,  if  the 
alphabet  Y  is  a  continuum,  X  is  the  Lebesgue  measure  on  Y.  If  Y  is  discrete 

(e.g.,  a  finite  set),  then  X  is  the  measure  which  assigns  equal  mass  to  all 

the  elements  of  Y.  Finally,  if  Y  has  both  discrete  and  continuous 

components,  then  X  turns  out  to  be  a  convex  combination  of  the  Lebesgue 

measure  on  the  continuous  part  of  Y  and  the  measure  that  assigns  equal  mass 
to  all  the  elements  of  the  discrete  part  of  Y. 

We  assume  that  for  each  x  e  X^  x  X^  the  probability  measures  are  only 

known  to  lie  in  a  convex  class  generated  by  a  Choquet  2-alternating  capacity 
[11] 

P  =  {P  e  P  I  P  (A)  <  v  (A) ,  V  A  e  F}  (1  ) 

v  x  1  x  x  ' 

X 

where  P  denotes  the  class  of  all  probability  measures  on  (Y,F),  and  v  is 

X 
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2-alternating  capacity  on  (Y,F)  with  v  (Y)  =  1.  For  rotational 

convenience, in  the  sequel  we  will  drop  the  dependence  of  v  and  P  on  x. 

x 

A  Choquet  2-alternating  capacity  [11]  on  (U,F)  is  a  finite  set 
function,  which  is  increasing,  continuous  from  below,  continuous  from  above 
on  closed  sets,  and  satisfies  v(0)  =  0  and  v(AUB)  +  v(A  B)  S  v(A)  +  v(B)  for 
all  A,  B  e  F.  Notice  that  any  finite  measure  v  is  a  2-alternating  capacity; 
in  this  case  the  uncertainty  class  generated  by  (1)  reduces  to  Py  =  { v } .  If 

we  further  assume  that  U  is  compact  then  all  the  uncertainty  models 
mentioned  in  Section  I  are  capacity  classes.  If  U  is  not  compact  [e.g., 

U  =  (-<»,  oo)]  only  the  band  model  can  be  defined  in  terms  of  a  capacity. 

An  example  of  a  2-alternating  capacity  class  is  the  total-variation 
neighborhood  model  ]12]  defined  by 

Pv  =  {P  Ipq(A)  -  P(A) |  <  e  ,  V  e  F}  (2) 

where  PQ  is  a  known  measure  (not  necessarily  a  probability  measure)  and  e  in 
[0,1]  is  the  degree  of  uncertainty  in  the  model.  Then  (2)  can  be  expressed  in 
the  form  (1)  if  we  set 

v( A)  =  min{ Pq(A)  +  e  ,1}  (3) 

which  is  a  2-alternating  capacity.  See  [12],,  3,3]  and  [14]  for  a  description  of 
other  capacity  classes. 

In  the  sequel  we  will  need  the  following  fundamental  result  which  is 
due  to  Huber  and  Strassen  [14]: 

Lemma  1  If  v  is  a  2-alternating  capacity  on  (Y,F)  and  Py  is  a  convex 
class  of  probability  measures  determined  by  it  as  in  (1),  then  there  exists 
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a  unique  A  measurable  function  n :  Y  -*•  [0,®]  with  the  defining  property 
that  for  each  8  e  [0,  <*>]  and  A  defined  by  A.  =  {it  >0} 

o  o  V 

ex( A)  +  V(AC)  S  0 A ( A  )  +  v(A°)  (4) 

□  o 

for  all  A  e  F.  Furthermore  there  exists  a  measure  P  in  Py  such  that  for  all 
9  e  [0,“] 

P({ttv  <  e})  =  v({ttv  <  0})  (5) 

which  means  that  P  makes  ir  stochastically  smallest  over  all  P  in  P  ,  and  ir 

is  a  version  of  dP/dA,  the  generalized  Radon-N ikodym  (R-N)  derivative  of  P 

with  respect  to  A;  that  is  dP/dA  may  be  infinite  on  a  set  of  A  measure  0. 

The  function  -rrv  is  termed  the  Huber-Strassen  derivative  of  v  with 

A 

respect  to  A  (v  may  not  be  a  measure).  The  probability  measure  P  singled 
out  by  Lemma  1  is  termed  the  least-favorable  measure  of  the  class  P  .  Let 

A  A  A  A  A 

P  =  P'  +  P"  be  the  Lebesgue  decomposition  of  P,  where  P'  is  absolutely 

continuous  with  respect  to  A  and  P"  is  singular  with  respect  to  A  (that  is, 
it  concetrates  all  its  mass  on  sets  of  A  measure  0).  Then, 


P  '  ( A) 


L  n  dA 
'  A  v 


(6a) 


and 

P"(A)  =  v(A  A  {n  «}),  (6b) 

v 

for  all  A  e  F.  For  example  for  the  total-variation  model  of  (2)  the  Huber-Strassen 

A 

derivative  ir  =  p  is  defined  as 
v 


(7) 


p ( y )  =  max{c",  rain{c " "  ,ir  (u)  } } 

where  tIq  =  dP^/dA  is  the  R-N  derivative  of  of  (2)  and  c' ,  c"  are 

chosen-  so  that  P(Y)  =  1.  See  [12]  -  [14]  for  the  definition  of  p  for  the 
other  capacity  classes. 

We  emphasize  that  for  the  case  treated  in  this  section  of  the  paper  the 
probability  measure  P^  and  the  Choquet  capacity  actually  depend  on  x 

(e.g.,  v  of  (3)  actually  depends  on  x  through  P_  and  P  which  vary  with 

X  *~  U ,  X  i  » x 

x)  and  so  does  p  =  ir 

x 

It  should  be  noted  that  Huber-Strassen  derivatives  of  generalized 
capacities  [a  generalized  capacity  is  defined  in  the  same  way  as  a  2- 
alternating  capacity  except  that  it  is  required  to  be  continuous  from  above 
on  compact  (and  not  just  closed)  sets]  with  respect  to  o-finite  (and  not 
just  finite)  measures  can  be  constructed  [20,  Chapter  IV],  Then,  Lemma  1 
still  holds  provided  that  it  is  properly  modified.  One  of  the  implications 
of  this  extension  is  that  several  of  the  most  useful  examples  of  capacity 
classes  (e.g.,  e-mixtures, variation  neighborhoods)  are  generalized 
capacities  when  U  is  o-compact  (and  not  just  compact). 

B .  Mismatch  Coding  Theorems  for  Two-User  Block  and  Tree  Codes 
Suppose  that  in  the  presence  of  uncertainty  about  P  (A),  x  e  X^  x  X^ , 

A  e  F,  the  decoder  mistakenly  assumes  that  (or  attempts  to  estimate  P  and 

comes  with  an  estimate  that)  is  the  probability  distribution  governing 
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the  statistics  of  the  DM-MAC.  Therefore  it  uses  a  maximum-likelihood  (ML) 
decoding  rule  based  on  p(y|x)  =  (dP^/dAHy)  instead  of  the  true 


p(y|x)=(dP  /dX) (y) .  This  situation  is  characterized  by  mismatch .  Then  the 


following  result  holds  for  block  codes  used  on  a  two-user  MAC  [6]: 

Theorem  1  :  Consider  a  DM-MAC  characterized  by  P  (A),  A  e  F=o(Y).  Let 

Qj,j  =  1,2  be  an  arbitrary  probability  assignment  on  the  user  j  channel 

input  symbols.  Suppose  the  decoder  employs  inaccurate  ML  decoding  based  on 

P ( * | * » * )  instead  of  the  true  probability  transition  matrix  p(*| *,*)-  Then, 


for  R  =  (R^.R^.R^)  satisfying 
R  <  I  (Q,p;P) 

J  J 

where  =  R1  +  R2,  Q  =  (Q1fQ2), 


I1 (Q,p;P) 


u 


VV¥ 


p (y 1 x  ,x  ) 

In  - ~ - dP  (y)dQ(x  )dQ(x  ), 

J  p(y jx^ ,x2)dQ(xj )  - 

X1 


(8) 


(9a) 


I2(Q,p;P) 


Y 


p(yjx  ,x  ) 

In - r - : - dP  (y)dQ  (x  )dQ  (x  ), 

|  p(y  |x1  ,xpdQ2(xp  - 

X2 


(9b) 


I  (Q,p;P)  =  L  L  L  In 


p(y |x1 ,x  ) 


X1  X2 J  Y 


lx  ix  P(ylx1'»xpdQ1  (x’)dQ2(xp 


■dQ1  (x1  )dQ2(x2)  , 


(9c) 
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the  average  probability  of  decoding  error  P^,  over  the  ensemble  of  pairs  of 
random  block  codes  of  rates  (R^.R^)  and  length  n  (for  which  the  n  letters  of 

each  codeword  are  chosen  from  the  input  alphabets  X^  and  X2  independently 

nR^  nR2 

and  according  to  and  Q2,  respectively,  while  the  [e  J  and  [e  ] 
codewords  are  mutually  independent  and  equiprobable)  is  upperbounded  by 


P  (p,Q,p;P)  given  by 
n  -  - 

3 

Pn(fi,Q,p;p)  =  j  exp  t-n[E.(p.,Q,p;p)  -  P  j R j  ^ ) 
where  g  =  (p^.pg.p^)  and  for  p  in  [0,1] 


E^P.Q.PSP) 


(10) 


1 

[L  p(y|x1',x2)UpdQ1(x1’)]P}dpx(y)dQ1(x1)dQ2(x2), 
1 

(11a) 


E2(p,Q,p;p) 


,  x  2 ) 


P 


1  +p 


1 

,xp1+pdQ2(x^)]pdPx(y)dQ1  (x1  )dQ2(x2)}, 


(11b) 
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and 


e3(p,Q,p;P) 


-ln{  lx  lx  !y  p(y|x1 


x2) 


p 

1+p 


[L  L  p(y|x1-,x2)1+PdQ1(x1')dQ2(x’)]pdPx(y)dQ1(x1)dQ2(x2)}. 

1  2 

(11c) 

For  this  theorem  to  be  valid  is  squired  that  the  mismatch  mutual 

information  I.(Q,p;P)  j  =  1,2,3  of  (9a)  -  (9c)  should  be  strictly  positive 

and  the  exponents  E  (p,Q,p;P)  j  =  1,2,3  of  (11a)  -  (11c)  should  be  strictly 

positive  for  all  p  in  [0,1].  These  positivity  requirements  are  satisfied 

for  the  choice  of  p  in  Theorem  3  below. 

The  achievable  region  for  the  two-user  MAC  and  inaccurate  ML  decoding 
is  then  defined  as  the  closure  of  the  convex  hull  of  the  union  of  the  sets 
r(q)  of  rate  pairs  (F^.f^)  which  satisfy  (8)  as  Q  =  (Q., ,Q2)  ranges  over  all 

possible  probability  measures  on  X^  x  X^. 


Remark  1.  We  used  the  notation  of  Ij(Q,p;P)  for  j  -  1,2,3  instead  of 


KX  ;Y|x2),  I(X2;Y|x1),  and  KX^X^Y),  respectively,  to  emphasize  the 


dependence  of  the  mismatch  mutual  information  functions  on  Q  and  both  p  and 


P;  the  notation  I(X  ;Y|x,,)is  usually  reserved  for  the  matched  case  (p=p). 
Also  notice  that  for  notational  convenience  we  have  dropped  the  dependence 
of  p  ,  P  ,  p  ,  and  P  on  x. 

X  X  X  X 

Remark  2.  We  consider  Theorem  1  important  in  two  ways:  as  being  a 
fundamental  intermediate  result  necessary  for  the  proof  of  Theorem  2a  below, 
and  as  an  interesting  independent  result  which  completely  characterizes  the 
achievable  rate  region  for  the  case  of  mi smatch  (i.e.,  when  the  actual 

channel  probability  transition  matrix  p  is  different  that  the  estimate  p 
employed  in  the  ML  decoding). 

For  two-user  tree  codes  and  a  decoder  which  employs  a  ML  test  based  on 

p ( * | * . * )  instead  of  the  true  p ( * [ *  *  * )  (about  which  there  is  uncertainty)  the 
following  result  holds: 

Theorem  2  :  Under  the  assumptions  of  Theorem  1  suppose  that  user  j 

1 

(j  =  1,2)  is  assigned  a  tree  code  of  rate  =  -  1 n  M.  nats  per  channel 

symbol  satisfying  (8)  and  constraint  length  K,  and  consider  the  ensemble  of 
random  two-user  tree  codes  generated  by  assigning  N  channel  input  letters 
independently  and  according  to  the  probability  distribution  to  the 

branches  of  the  trees.  Then  the  average  probability  of  decoding  error  Pg 
over  the  above  ensemble  of  pairs  of  tree  code  is  upperbounded  by  PK(g,Q,p;P) 


given  by 
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+  e  KNE3(p3,-’P:P)f(N[E3(p3,Q,p;P)-p3(R1+R2)]) 

• {l  +  f(N[E1  (p3,Q,p;P)-p3R1 ])+f(N[E2(p3,Q,p,P)-p3R2)} 

(12) 

where  f(x)  =  e  X/(1-e  X),  g  =  (p^,p2tP^)*  0  =  Pj  =  min{Ej ( p^ , Q,p ;P) /Rj , 1 }  for 

j=1,2,  and  0  ^  p3  £  min{Ej  ( p3 , Q,p ;P) /R.j  ,  E^(  p3, Q, p ;P) /R2»  E^  p3>  Q,p ; P)  / (R.| +R2)  >  1  1  • 

The  exponents  Ej(p,Q,p;P)  are  defined  by  (11a)  -  (11c).  For  this  theorem 

to  be  valid  it  is  required  that  I.(Q,p;P)  >  0  and  E.(p,Q,p;P)  >  0  for  p  in 

J  J 

[0,1]  and  j  =  1,2,3;  conditions  which  are  satisfied  for  the  choice  of  p  in 
Theorem  4  below. 

The  proof  of  Theorem  2  is  based  on  a  straightforward  modification  of 

the  proof  for  the  case  with  accurate  ML  decoding  (i.e.,  p=p)  given  in  [4]. 

The  same  arguments  as  in  [4]  may  be  used  the  only  difference  being  that 

E  (p,Q,p;P)  instead  of  the  usual  Liao  error  exponents  E  (p,Q)  =  E  (p,Q,p;P) 
j  J  J 

are  involved  in  the  equations,  since  the  decoder  now  employs  p  and  not  p  for 
the  ML  decision  rule. 

C.  Minimax  Robust  Coding  Theorems  for  Two-User  Block  and  Tree  Codes 
In  this  subsection  we  assume  that  the  probability  measure  Px  which 

governs  the  statistics  of  the  channel  is  only  known  to  lie  in  a  class  of  the 
form  (1)  desribed  in  Section  II.  A.  The  channel  encoder  employs  a  ML 

decoding  rule  based  on  p  in  a  way  desribed  in  Theorems  1  and  2,  The  goal 
is  to  choose  p  so  that  for  all  code  rates  larger  than  a  critical  rate  the 


13 


probabiity  of  decoder  error  approaches  zero  with  increasing  blocklength  (or 
constraint  length)  for  all  channels  in  the  class. 

Equipped  with  Theorems  1  and  2  and  the  Huber-Strassen  theory  of 
least-favorabil ity  (as  condensed  in  Lemma  1)  we  now  prove  the  main  results 
of  this  section. 

Theorem  3  :  Suppose  the  probability  measure  Px  on  Y  belongs  to  a  class  of 
the  form  (1)  and  is  the  element  of  the  class  singled  out  by  Lemma  1. 

A  A 

Suppose  further  that  the  decoder's  ML  decoding  rule  is  based  on  p  =  dP^  /dX. 

Then  the  following  inequalities  are  true  for  all  pairs  of  probability 
measures  (Q1 .Q^)  on  x  and  p  in  [0,1] 

I  (Q,p;P)  >  I  (Q,p;P)  >  I  (Q,p;P),  j  =  1,2,3  (13) 

0  o 

and 

E  (p,Q,p;P)  £  E  (p,Q,p;P)  >  E  (p,Q,p;P),  j  =  1,2,3-  (14) 

vj  J 

~  ~  A  ~  A  A 

Furthermore,  the  operating  point  (g,Q,p)  where  (g,Q)  =  arg  min  P  (g,Q,p;P) 

(£,Q)  n 

and  the  channel  determind  by  P  form  a  saddle  point  for 

min  max  P  (g,Q,p';P),  i.e., 

( P , Q.P ' )  P  n 

P  ( p , Q , p ; P )  S  P  ( p , Q , p ; P )  <  P^(p,Q,p;P)  (15) 

n  -  -  n  -  -  n  -  - 

Finally,  for  any  pair  of  rates  (R  ,R2)  it  is  necessary  and  sufficient  to  lie 


inside  the  region  determined  by  the  conditions 
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R.  <  I  (Q,p;P) ,  j  =  1,2,3  (16) 

where  Q  =  (Q^  .Q^)  ranges  over  all  pairs  of  probability  measures  on  x  , 

in  order  to  guarantee  that  the  average  probability  of  decoding  error  for  the 
ensemble  of  pairs  of  random  block  codes  of  length  n  and  rates  (R^.R^) 

converges  to  zero  exponentially  with  increasing  n  for  all  channels  in  the 
class . 

Remark  3*  The  rate  region  determined  by  (16)  represents  the  channel 
capacity  region  of  the  class  desribed  by  (1).  Similarly,  the  rate  region 

determined  by  R^  <  E  (p.Q.pjP)  where  Q  =  (Q^Q^  ranges  over  all  pairs  of 

probability  measures  on  X  x  X2  represents  for  p  =  .5  the  cutoff  rate  region. 

Remark  4.  Notice  that  equations  (13)  and  (15)  indicate  that  the  measure 

(singled  out  by  Lemma  1)  characterizes  the  worst  case  (or  least-favorable) 
channel  in  terms  of  both  the  information  rate  and  the  error  probability 

among  all  the  channels  in  the  class  P  defined  by  (1). 

x 


Proof :  We  first  prove  the  inequalities  in  (13)  —  (15).  In  particular,  the 
right-hand  inequalities  in  (13)  for  j  =  1,2,3  are  results  of  Jensen's 
inequality  and  the  concavity  of  ln(*).  Similarly,  the  right-hand  inequality 
in  (14)  for  j  =  1  is  a  result  of  Holder's  inequality 

j  fgdy  S  [|fadp]1/a  [jgSdp]1/B  (17) 

-1  -1 

where  1  <  a  <  °°,  1  <  8  <  °°,  and  a  +8  =  1  ,  when  applied  for 


f  =  p(y|x1  ,x2)('1+p)  ,  a  =  (1+p)/p,  g  =  p(y|x1  ,x2)1+pp(y|x1  ,x2)  (1+p)  , 
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6  =  1+p,  and  du  =  dQ^(x^).  For  j=2  we  only  need  to  set  dp  =  dQ2(x2), 

whereas  for  j=3  we  should  set  dp  =  dQ1 (x1 )dQ2(x2)  and  use  double  integrals 

instead  of  single  integrals  in  (17).  Finally,  the  right-hand 

inequality  in  (15)  is  true  since 

Pn(p,Q,p;p)  S  Pn(p,Q,p;P)  <  Pn(e,Q,p;P) 


where  the  first  inequality  holds  because  of  the  definition  of  £  and  Q,  while 


the  second  inequality  follows  from  the  right-hand  inequalities  of  (14)  and 
the  fact  that  P^  [see  (10)]  is  a  decreasing  function  of  E.  for  j  =  1,2,3. 

Next  we  prove  the  left-hand  side  inequalities  in  ( 1 3 ) »  (14),  and  (15). 
We  start  with  the  left-hand  side  inequality  in  (13).  We  may  use  the 


following  sequence  of  arguments. 


First  we  define  the  functions  G. 

J 


V 


to  be 


VvV  =  h  ln 


p(y |x] ,x2) 


T^~p(y - 


x2)dQ(xp 


dPx(y) 


(18a) 


G  (p  , P  )  =  J  ln 
2  x  x  Y 


pCy | x1 ,x2) 

Jx  P(y  |x1  ,x^)dQ(xp 


dpx(y) , 


(18b) 


WV  =  iY  m 


p(y |x1 ,x2) 

Iv  Jv  p(y  lx-|  ,x^)dQ(x1')dQ(xp 


dpx(y) » 


'VX2 


(18c) 


and  observe  that  we  can  wr 


ite  I  . ( Q , p ; P )  =  ,v  >v  G  (p  ,P  )dQ  (x  )dQ„(x  )  for 


X1 1 X2  ^  x  x  1  '"1  '^2x~2y 


j  =  1,2,3.  Here  we  will  make  use  of  the  dependence  of  p  and  P  on  x  and 
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thus  we  employ  the  unabbreviated  notation  p  ,  px.  Notice  that,  if  we  show 
that 

VW  2  VvV"1  ■  '■2’3  (,9) 

for  all  x  e  X^x  then  the  left-hand  side  of  (13)  follows.  Equation  (19) 

holds  because  we  can  write  G.(p  , P  )  =  L,  g.(p  )dP  where  for  j  =  1,2,3  g. 

j  x  x  Y  j  x  x  J 

is  an  increasing  function  of  px  and  according  to  Lemma  1  P^  makes  px 

stochastically  smallest  over  all  P  in  P 

xv 

X 

The  left-hand  inequality  in  (14)  can  be  proved  in  a  similar  way.  We 

now  define  the  functions  H.(p,p  ,P  )  as 

J  X  X 


H^p.IyP^)  =  fy  p(y|xrx2)  1+P[|XiP(y|x’,x2)  1+PdQ(x1')]PdPx(y),  (20a) 

_P_  1_ 

H2(p,Px,Px)  =  K  P(y|xi’x2)  1+P^X  P(y|xi’X2}  1+PdQ(x2)]PdPx(y)»  (20b) 


and 


~  9 

p(y|x1,x2)  1+P[JX  fx  PCy  |  x’  ,xp  1+pdQ1(x1')dQ2(xp]PdPx(y) 


1  2 


(20c) 


Since  E.(p,Q,p;P)  =  exp[-j  j  H.(p,p  P  )dQ  (x  )dQ(x  )]  the  left-hand  side 
J  A^  A^,  J  A  XII  d 

inequality  in  (14)  is  satisfied  if 


Hj(p,px,px)  *  H.(p,px,Px) 


(21  ) 
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is  valid  for  all  x  e  X1  x  and  p  in  [0,1].  Eq.  (21)  can  be  proved 
similarly  to  (19),  that  is,  by  defining  an  appropriate  decreasing  function 
of  p  and  applying  Lemma  1.  The  left-hand  side  inequality  in  (15)  is  then  a 

straightforward  application  of  the  left-hand  side  inequality  in  (14)  for 
p  =  p  and  Q  =  Q  and  the  fact  that  Pn  is  a  decreasing  function  of  the  E  '  s. 

Next  we  prove  the  positivity  of  I.(Q,p;P)  and  E  (p,Q,p;P)  j  =  1,2,3  for 

"  J  J 

all  p  =  dP/dA  with  P  in  P  all  p  in  [0,1],  all  probability  measures  Q  = 

K  v 

(Q^.Qg)  on  X-,  x  X2’  and  p  =  dP/d^  as  sinSled  out  by  Lemma  1.  We  first  show 

A  A 
A 

that  I  ■  (Q , p ; P)  >  0  j  =  1,2,3-  We  use  the  fact  that  I  (Q,p;p)  >  0  [the 

usual  Liao  functions  are  strictly  positive  unless  the  channel  output  are 
independent  in  which  case  they  are  zero;  we  exclude  this  case  by  requiring 
that  all  measures  P^  which  belong  to  the  uncertainty  class  described  by  ( 1 ) 

are  not  (for  fixed  y)  constant  functions  of  (x^x^j;  the  proof  is  based  on 

Jensen's  inequality  and  the  concavity  of  ln(0.  Then  we  use  the  left-hand 
side  inequality  in  (13)  to  prove  the  desired  result.  Similarly  to  prove 

that  E  ( p , Q,p ; P )  >0  j  =  1,2,3  we  first  need  to  show  that  E  ( p,Q,p;P)  >  0. 

j  -  J 

The  proof  of  this  inequality  for  j  =  1  is  based  on  applying  Holder's 

1 

inequality  [see  (17)]  for  f  =  p(y|x-j>x2^  >  a  =  1+P*  8  =  8  (1+p)/p 

and  dp  =  dQ^x^.  For  j  =  2  we  only  need  to  change  dp  to  dp  =  dQ2(x2), 
whereas  for  j  =  3  we  need  to  set  dp  =  dQ^ (x^ )dQ2(x2)  and  use  double 
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integrals  instead  of  single  integrals  in  (17).  Again  the  inequalities  are 
strict  unless  the  channel  inputs  and  the  channel  output  are  independent. 

Finally,  we  use  the  left-hand  side  inequality  in  (14)  to  prove  the  desired 
result . 

We  can  now  proceed  to  the  final  stage  of  the  proof  of  Theorem 

/\  A  A 

First,  because  of  (13)  R.  <  I.(Q,p;P)  implies  that  R  <  I  (Q,p;P)  for  all 

J  J  J  J 

*■*  /\ 

P  in  P  .  Then  Theorem  1  applied  for  p  =  p,  implies  that,  for  the  ensemble  of 
random  block  codes  of  rates  (R  ,R  )  and  length  n  described  there 

the  average  probability  of  decoding  error  converges  to  zero  exponentially 
with  increasing  n.  Since  this  is  true  for  all  P  in  the  class  under 

consideration,  the  suf f iciency  of  condition  ( 1 6 )  is  established.  To  prove  . 

jive  tuu.tn£  ctjpqqtij  •Mieorern  DM-MA  C  s  • 

its  necessity,  notice  that  according  to  the  "converse  of  "  if  a 

pair  of  rates  (R^R^  lies  outside  the  region  determined  by  (16)  as  Q  = 

(q^  ,q^)  ranges  over  all  possible  probability  measures  on  x  X^,  then  the 

asymptotically  good  performance  is  violated  for  the  channel  determined  by  P, 
which  is  a  member  of  the  aforementioned  class.  This  completes  the  proof  of 
Theorem  3. 

At  this  point  we  discuss  the  choice  of  the  operating  point,  that  is  of 

3 

a  triple  of  the  form  (p,Q,p),  where  p  is  vector  parameter  in  [0,1]  involved 
in  the  minimization  of  the  error  probability,  Q  is  the  probability  measure 

on  the  input  alphabet  X1  x  X2,  and  p  is  involved  in  the  ML  decoding  at  the 

receiver.  This  choice  depends  on  the  main  objective  of  our  optimization. 

If  our  main  objective  is  to  operate  at  the  maximum  transmission  rates  [near 
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the  boundary  of  the  region  determined  by  (16)],  then  the  operating  point 

A  A,  *\  /\  A 

should  be  (g,Q,p)  where  Q  =  (Q  ,Q^)  is  the  pair  of  pdf's  which  achieves  a 

particular  point  (R^.R^)  =  (I^(Q,p;P),  I^CQ.pjP))  on  the  boundary  of  the 

achievable  region  and  £  =  arg  min  P  (g,g,p;P) .  However,  if  our  main 

o  n 

objective  is  to  minimize  the  error  probability,  then  (g,Q,p)  (g  and  g  as 

defined  in  Theorem  3)  should  be  the  operating  point  and  the  rates  of 
transmission  (R  ,R  )  should  lie  inside  the  region  determined  by 

Rj  <  Ij(Q,p;P)  j  =  1,2,3  (R^  =  R^  +  R^)  instead  of  that  determined  by  (16). 

As  a  final  comment  for  Theorem  3,  notice  that,  under  mild  continuity 

AAA  A  A 

requirements  on  the  convex  functions  Pn(i,Q>p;P)  and  P  (i,i»p;P)  and  their 

derivatives,  the  minima  involved  [the  minimizing  arguments  are  g  and  (g,g), 
respectively]  exist. 

A  similar  result  holds  for  two-user  tree  codes: 

Theorem  4  :  Under  the  assumptions  of  Theorem  3  and  for  any  pair  of  rates 
(R^,R2)  which  lies  inside  the  region  determined  by  (16),  the  probability  of 

decoding  error  for  the  ensemble  of  two-user  random  tree  codes  of  rates 

(R^.R^)  and  constraint  length  K(which  is  described  in  Theorem  2  when  applied  for  p=p^ 

converges  to  zero  exponentially  with  increasing  K  for  all  channels  in  the 

A  A 

class.  Furthermore,  if  we  define  (g'Q')  =  arg  min  P  (g,Q,p;P)  where 

(£.9) 

0  <  Pj  <  min{Ej(Pj,Q,p;P)/Rj,1 }  for  j  =  1,2 


20 


S  min{E 


(p  ,Q,p;P)/R1 


E2(p3,Q,p;P)/R2,E3(p3,Q)p;P)/(Ri+R2),1 


(22) 

then  the  operating  point  (g',Q',p)  and  the  channel  determined  by  P  form  a 


saddle  point  for  min  max  P  (g,Q,p;P);  i.e.,  the  following  inequalities 

(fi.Q.p)  K 


hold  for  all  P  in  P  : 

v 


Ve'.Q'.psP)  ^  p^e'.Q'.psP)  s  p  (e,Q,P;p) 


(23) 


Proof:  We  first  prove  the  inequalities  in  (23).  The  left-hand  side 
inequality  in  (23)  is  a  result  of  the  left-hand  side  inequality  in  ( 1 4 ) 
applied  for  p  =  £>'  and  the  fact  that  P  is  a  decreasing  function  of  the  E  's 

for  j  =  1,2,3-  Then  the  right-hand  side  inequality  in  (23)  is  true 

Ucause  of  defiM+.o*  of  ( p' ,  Q' )  aUe  ,  4ke  sUe 

m  (m)  cv^d  4U  fad  i<>  a  decrees, ng  4Ue  Ej  "s  y  ,2,3 

A  A 

The  positivity  requirements  on  I  .(Q,p;P)  and  E.(p,Q,p;P)  j  =  1,2,3 

J  J 


which  are  necessary  for  the  validity  of  Theorem  2  are  the  same  as  those  for 
Theorem  1  and  are  satisfied  as  shown  during  the  proof  of  Theorem  3.  To 
complete  the  proof  of  Theorem  4  notice  that  because  of  the  left-hand 
inequality  in  (13)  the  rate  region  determined  by  (16)  lies  inside  the  rate 


region  determined  by  R  <  I  (Q,p;P)  j  =  1,2,3  for  all  P  in  the  uncertainty 

■J  vJ 

class  considered.  Furthermore,  because  of  (14),  any  p  =  (p1>p2,p  )  which 
satisfies  the  conditions  (22)  also  satisfies  these  conditions  when 
Ej(p,Q,p;P)  is  replaced  by  Ej(p,Q,p;P)  for  j  =  1,2,3-  Consequently  all  the 


assumptions  of  Theorem  2  are  satisfied  and  Theorem  2  applied  for  p  =  p 

implies  that  for  the  ensemble  of  two-user  random  tree  codes  of  rates  (R  R  ) 

I’  2 
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in  the  region  described  by  (16)  and  constraint  length  K  the  average 
probability  of  decoding  error  converges  to  zero  exponentially  with 
increasing  K.  Since  this  is  true  for  all  P  in  the  uncertainty  class,  the 
proof  is  complete. 

As  discussed  at  the  end  of  the  proof  of  Theorem  3  the  choice  of  the 
operating  point  depends  on  the  objective  of  optimization.  For  tree  codes  a 
similar  choice  should  be  made  and  we  omit  the  details. 


22 


III.  ROBUST  CODING  FOR  STATIONARY  GAUSSIAN  MULTIPLE -ACCESS  CHANNELS 
A.  Spectral  Uncertainty  Classes  Generated  by  Choquet  Capacities 
Suppose  that  X1  =  X2=  Y  =  (-“>,“>)  for  the  input  and  output  alphabets  and 

the  discrete-time  stationary  Gaussian  multiple-access  channel  (SG-MAC)  is 

characterized  by  the  probability  transition  matrix  p^n^(jr|x^  , )  for  e  X^ 

,x2  e  X2>  y  e  Y  given  by 

P(n)  CY  |x1  ,x2)  =  (2-ir)”n/2  |  R(n)  j"1/2exp{-1/2(y-x1-x2)T[l(n)]~1(y-x1-x2)}.  (24) 

In  (24)  j  A |  denotes  the  determinant  of  the  matrix  A  and  the  matrix  is  a 

correlatiom  matrix  of  order  n  (which  because  of  the  stationarity  is  a 
symmetric  Toeplitz  matrix)  associated  with  the  spectral  density  <J>(w), 
a)  e  [-■ rr.tr]  =  Q 

Suppose  the  spectral  density  <j>  is  the  R-N  derivative  of  a  spectral 
measure  0  defined  on  sets  A  e  B  (where  B  is  the  a-algebra  generated  by 
subsets  of  fi  =  [-tt.tt])  with  respect  to  the  Lebesgue  measure  on  Q.  The 
spectral  measure  $  is  only  known  to  lie  in  the  convex  class  defined  by 

=  {<t>  e  *  |  $(A)  <  v(A),  V  A  e  B,  <p(n)  =  v(8)}.  (25) 

In  (25)  $  is  the  class  of  all  spectral  measures  on  (8,B).  We  impose  on  the 

spectral  measures  $  the  additional  constraint  $([— tr,ir])  =  v([— rr.ir])  = 

2 

2tt o  .which  is  a  fixed  noise  power  constraint  and  transforms  the  normalized 

2 

spectral  measures  0(A)/(2Tra  )  into  probability  measures;  this  is  necessary 
for  the  validity  of  the  Huber-Strassen  theory  of  least  favorability . 

All  the  results  about  Choquet  capacities  and  uncertainty  classes  of 
probability  measures  presented  in  Section  II. A  are  also  valid  for  the 
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spectral  uncertainty  classes.  Let  <j>  and  $  denote  the  Huber-Strassen 
derivative  and  the  least-favorable  spectral  measure  in  4^. 

We  will  also  assume  that  <£  is  absolutely  continuous  with  respect  to  X 
(i.e.,  $<<X).  This  is  not  so  restrictive  because  as  we  can  show  the  to La ! - 
variation  spectral  class  defined  by 

<f>  =  {4>  1 4>  -  $  I  <  e} 

v  1  0  1 

where  e  in  10,1]  is  known ^  assuming  the  known  nominal  spectral  measure  to 
satisfy  $Q<<X,  implies  that  <t«<X,  as  well.  Similar  conditions  on  the  nominal 
spectral  measures  of  the  contaminated  mixture  class  [12]  and  the  band  class 
[13]  result  in  <t>  being  absolutely  continuous  with  respect  to  X. 
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B.  Mismatch  Coding  Theorems  for  Two-User  Block  and  Tree  Codes 

It  is  assumed  that  the  channel  inputs  satisfy  average  input  energy 
constraints  of  the  form 

E{ | | Xj | | 2)  =  nEj ,  j  =  1,2  (26) 

where  | |  j  j  is  the  Euclidean  norm  of  the  n-dimensional  random  vector  and 
E  is  the  input  energy  per  channel  use  for  user  j. 

Suppose  that  in  the  presence  of  uncertainty  about  p  7  C Y) |  ) 

(induced  by  the  spectral  density  <J>)  the  user  mistankenly  assumes  that 
P^(y l^i  <*2^  (induced  by  <t)  is  the  n-th  order  probability  transition  matrix 


governing  the  statistics  of  the  SG-MAC.  Let  4>  and  0  denote  spectral 


measures  for  which  $  =  d4>/dA  and  $  =  d<J>/dA,  respectively. 

The  above  situation  is  characterized  by  mismatch  as  in  the  case 

c 

desribed  in  Theorem  1 .  Therefore  we  can  apply  Theorem  1  to  this  special 
case.  We  will  start  with  the  evaluation  of  the  mismatch  mutual  information 
and  the  mismatch  error  exponent  functions  for  the  new  case. 

In  the  case  of  discrete-time  SG-MAC  we  have  to  deal  with  triplets  of  n~ 
tuples  (x^jX^.y)  whose  components  (x;^  and  ^X1j’X2j’^j^  ma^ 

strongly  correlated.  It  is  advantageous  to  follow  the  technique  of  [17, 
Section  4.5.2]  and  make  the  problem  equivalent  to  that  of  n  parallel 
independent  additive  Gaussian  noise  (AGN)  channels.  This  involves  a  unitary 

transformation  of  x^ ,  x^,  and  y  associated  with  which  preserves  the 


mutual  information  relationships  and  the  average  input  energy  constraints. 
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The  variance  o.  of  the  Gaussian  noise  of  the  i-th  channel  is  the  i-th 

eigenvalue  of  the  Toeplitz  matrix  Furthermore  the  initial  average 

input  constraints  (26)  become  [ 1 8,  Section  7.5] 


- I  E..  =  E. 
n  i=1  J1  J 


j  =  1.2 


where  the  j-th  user's  input  to  the  i-th  channel  is  a  zero-mean  Gaussian 
random  variable  with  variance  .  Let  .  for  1  =  l,2....,n. 

Once  the  SG-MAC  has  been  decomposed  to  n  parallel  AGN  DM-MAC's  we  can 
apply  the  theory  of  parallel  AGN  channels  (see  [18,  Section  7.5]),  (24),  and 
the  definitions  (9)  and  (11)  of  Theorem  la.  The  asymptotic  (in  the  limit  of 
large  n)  mismatch  Liao  functions  and  the  asymptotic  mismatch  error 
exponents  take  the  form 

n  E  . .  E . .  o2 

I ,(r  <t>;4>)  -  lira-  l  [l/21n(1  +  ^)  +  1/2— J--,-r(  1  -  —)]  (28) 

J  J  n->°°  n  i=1  o 7  E.  ,  +  a  o 

l  ji  l  l 


E.(p.r  ,ct>;4))  =  lim  -  l  ln[l+  — —  — 

J  J  n+«  i=1  (1+p)o 


E  .  .  o  . 

+  1/2  ln[  1  +  - (1+p-p  ^j-)]} 

( 1 +p )  0  .  0 . 


2  ~2 

for  j  =  1,2,3.  In  (28)  -  (29)  <j.,  and  E^  j  =  1,2,3  for  i  =  1,2,...,  n 
are  the  eigenvalues  of  the  n-th  order  Toeplitz  matrices  induced  by  the 


spectral  densities  <(>,  <t>,  and  r^  (j  =  1,2,3),  respectively.  The  eigenvalues 
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E  j  =  1,2  satisfy  the  average  input  energy  constraints  (27)  as  discussed 

above  and  E_ .  =  E,  .+  E„ .  .  Then  q^.n^(x)  is  the  n-th  order  probability 

density  function  (pdf)  on  the  input  alphabet  induced  by  the  spectral 

density  r^  j  =  1,2.  Consequently,  by  taking  the  limits  as  n+®  in  (27), 

(28) ,  and  (29)  and  using  the  discrete-time  version  of  the  Toeplitz 
Distribution  Theorem  [19]  we  obtain: 


-r—  j  r  .  (oj)  X(  dtu )  =  E  . 
2tt  -ir  j  j 


,  r  (w)  r  (ai ) 

I(r  .  ,<j) ;<)))  =  ^  f  { ln[  1  +  - - ]+  — ^ - - 


-[i-  4^]|x(d„). 


4>  C  oj  )  r(oj)+cJ)(oo)  <J>(  a> ) 


,  fir  r  (u>) 

E.(p,r,<(»;4>)  =  -jr-  J  { ( p-1 )  ln[  1+  — ] 
J  J  4lT  ^  O+pHU) 


r.U)  .  , 

J _ 


+  ln[l+  - - (1  +p-p-~^-)  ]}x(dm) 

( 1  •*-p)4>(oj)  4>(m) 


for  j  =  1,2,3- 


Next  we  consider  the  input  spectral  densities  r^  which  maximize 

I .  (r  . ,  <f>  ;<J>)  for  j  =  1,2,3,  respectively,  the  asymptotic  Liao  functions,  for 
1  J 

the  matched  case  ($  =  <())  .  The  spectral  density  r^  has  been  shown  in  [18, 
Section  7.5]  to  be  defined  in  terms  of  a  parameter  as: 


rj(ui)  =  max|o,Yj-  4>  (  oj  ) } ,  w  e  [— ir ,  tt]  j  =  1,2,3 


where  the  parameter  Y^  is  determined  by  the  condition: 


(33a) 
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E.  =  ~  \~(  s~  ,[Y  -  i(aj)]X(du).  3=1. 2, 3  (33b) 

j  2ir  { 4>(  oj  )  <Yj  1  J 

where  +  £2* 

We  can  now  state  the  main  result  of  this  Section  which  follows  from 
Theoren  1  when  it  is  applied  to  SG-MAC's: 

Theorem  5  :  Consider  a  two-user  discrete-time  stationary  additive  Gaussian 
channel  with  independent  inputs  and  n-th  order  probability  transition  matrix 

p  ^Cyjx^  .x^)  induced  by  spectral  density  4>(w),  w  e  [-it, it].  User  j  employs 

the  n-th  order  input  pdf  q^(x)  induced  by  the  spectral  density  r 

J  J 

satisfying  ( 33a)-( 33b )  for  j  =  1 , 2  and  the  decoder  employs  inaccurate  ML 

decoding  based  on  p  (Y|*i>Xp)  (induced  by  the  spectral  density  ()>)  instead 

of  the  true  p^n^(y|x1 ,x2) .  Consider  the  ensemble  of  pairs  of  random  block 
codes  of  length  n  and  rates  (R  .FL,)  whose  codewords  are  chosen  independently 
with  equal  probability  and  the  n  letters  of  each  codeword  are  chosen  from 
the  input  alphabet  X.  according  to  q^(x).  Then,  if  the  rates  R] ,  R2 

J  «J 

satisfy 

Rj  <  Ij(Yj,<t>;$)  ,  j  =  1,2.3  <34) 

where , 

Y  Y 

I .  (Y  . ,  $  ;(f>)  =  7p  Ir",  x  ~  x  ilmr-j-  +  -=“[■=“>* - 1  ]  [<f>U)-<f>(w)  ]  1  *(du>) ,  (35) 

J  J  411  {<J,(u)<V  *(„)  Yj  <J>U) 

then  for  large  n  the  average  error  probability  of  decoding  error  P£  is 


upperbounded  by 


28 


P  <  l  exp  { -n[  E(p,Y.,0;0)  -  p  R.]} 
h  j=i  3  3  3  J  J 

where  for  p  in  [0,1] 


E  (p,Y 

J  J 


47  f  ~  ~  {(p-1)ln[l+14-[~i-  -1]] 

11  •* {<KwXy . }  p 


1][1+p-pl^-]]}A(du)).  (37) 

P  0(d))  0(d)) 


For  the  validity  of  this  Theorem  it  is  required  that  for  all  0  on  q  the 

mismatch  Liao  functions  satisfy  I.(Y.,0;0)  >  0  and  the  mismatch  error 

J  0 

exponents  satisfy  E.(p,Y.,0;0)  >  0  for  all  p  in  [0.1],  These  positivity 

3  3 

requirements  are  satisfied  for  the  choice  of  0  and  Y.  in  Theorem  7  below. 

\ J 

Remark  5.  The  mismatch  error  exponents  in  (37)  have  been  evaluated  for  an 
input  spectral  density  r  [given  by  (33a)-(33b)]  which  maximize  the  mutual 

information  functions  I  (^,0:0)  of  (31).  If  the  objective  is  to  maximize 

the  error  exponents  E.(p,r.,0;0)  of  (32)  (and  thus  minimize  the  bound  on  P  ) 

J  J 

the  appropriate  input  spectral  densities  r.  is  given  by 


r.  (io)  =  ( 1  +p)max[0,  Y  .  -  0(d))},  d)  e  [— 
JP  J  P 

E  *  lr  ^(„xyJpi  c  V  ♦<“,]u<1“k 


(38a) 


(38b) 


and  the  mismatch  error  exponents  become: 
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Ej(p,Yjp,'j>;<l>) 


1  ’  ~ 
li  IT  *  i  A\  t 


Y  . 


^TT  J  {  4>  (  01 )  <  Y  .  } 
JP 


{ (  p  —  1 )  lrh=— +ln[l  +[■=— ^ — 1  ][  1  +p-p~^-]  ] }  A  ( do) ) 


<J>(w) 


4>(  ai ) 


<J)(u) 


(39) 


For  two-user  tree  codes  of  rates  (R^.R^)  where  R^.  =  -j-j  log^M^  bits  per 

channel  symbol  with  Viterbi  decoding  and  a  ML  test  based  on  ;(*|*,*)  [the 

overall  equivalent  block  length  in  channel  input  symbols  is  now  n=(L+K-1)N  , 
which  corresponds  to  Llog^M^  input  bits  of  user  j,  j  =  1,2]  a  similar  result 

holds: 

Theorem  6  :  Under  the  assumptions  of  Theorem  5  suppose  a  tree  code  for 

user  j  (j  =  1,2)  has  constraint  length  K  and  rate  R^  =  ^  log^M^  bits  per 

channel  symbol  satisfying  (3*0,  and  consider  the  ensemble  of  pairs  of  random 

codes  generated  as  described  in  Theorem  2.  Then  the  average  bit  error 
probability  of  the  Viterbi  decoder  P^  is  for  large  K  upperbounded  by 

P  ( p ,  Y ,  )  which  can  be  obtained  from  (12),  if  we  replace  E.(p,Q,p;P)  with 

K  —  —  J 

E.(p,r  .,<J>;4>)  for  j  =  1,2,3-  The  parameters  p.  must  satisfy  the  same 
j  J  J 

conditions  as  for  Theorem  2  provided  that  we  replace  the  error  exponents  E 
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used  there  with  the  exponents  defined  in  (37).  The  same  positivity 

requirements  on  I.  and  E.  as  these  for  Theorem  5  should  be  satisfied. 

J  0 

The  proof  of  Theorem  6  is  a  straightforward  extension  of  Theorem  2  to 
the  SG-MAC  case  and  will  be  omitted.  The  same  technique  of  decomposing  to  n 
parallel  AGN  MAC’S  may  be  applied. 

C .  Minimax  Coding  Theorems  for  Two-User  Block  and  Tree  Codes 
Next  we  assume  that  the  spectral  density  <f>  which  induces  the  transition 
(n  ) 

probability  matrix  p^  is  given  by  0  =  d$/dX  where  $  belongs  to  a 

class  of  the  form  (25)  described  in  Section  III. A.  The  channel  decoder 


employs  a  ML  decoding  rule  based  on  p^n^(*|*)  (induced  by  a  spectral  density 

$)  in  a  way  desribed  in  Theorems  5  and  6  .  The  goal  is  to  choose  <J>  so  that 
the  asymptotic  convergence  of  the  probability  of  decoding  error  is 
guaranteed  for  all  channels  in  the  class.  This  is  accomplished  with  the 
following  result: 

Theorem  7  :  Suppose  the  spectral  measure  4>  [where  c|>  =  d<t>/dX  induces 
pV  t)ei°nS5  to  a  class  of  the  form  (25)  and  <J>  is  the  element  of 


the  class  singled  out  by  Lemma  1  which  also  satisfies  3>«A. 

that  j-th  encoder  employs  an  input  pdf  q(n)(x)  induced  by  r 

J  j 


Suppose  further 
defined  by  (33a)-(33b) 


for  <J>  =  4>  and  T  -  Y  and  the  decoder's  ML  decoding  rule  is  based  on  p(n)(. 

<}>  where  <j>  =  d$/dX.  Then  the  following  inequalities  are  satisfied  for  all 
with  0  in  $  : 


induced  t 


i  (T  . ,  $  ;<J>) 
i  J 


Ij(Y  S 


vv 


’;<*>),  j  =  1,2,3 


(40) 
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[i.e.,  the  operating  point  (Y.,<p)  and  the  channel  determined  by  $  form  a 
saddle  point  for  max  min  I  .  ( Y  . ,  <f>  ;<j>)  ] ,  and 

/  -v  <t>  J  J 


Ej(p,Y,,*;*)  2  E  (p.Y ,,*!*)  2  E  (p.Y 
for  all  p  in  [0,1].  Furthermore,  the  conditions 


(Ml) 


Rj  <  j  =  1,2,3 


(42) 


are  sufficient  and  necessary  to  guarantee  that  for  the  ensemble  of  pairs  of 
random  block  codes  of  length  n  and  rates  (R^,R  )  desribed  in  Theorem  5 

(when  applied  for  4>  =  <Ji)  the  average  probability  of  decoding  error  converges 
to  zero  exponentially  with  increasing  n  for  all  channels  in  the  class. 

A  A  A 

Remark  6.  The  inequalities  R^  <  I ^ ( Y j ,  4* ; *J> )  determine  the  channel  capacity 
region  of  the  class  described  by  (25)  where  the  Liao  functions  are  given  by 


=  4 it  {<t>(io)<y . }  ln;\j.  x(du))- 

J  4>  ( w ) 


(43) 


Similarly  the  quantities  E,(p,Y.  ,4>;<j>)  obtained  from  (39)  and  (38a)-(38b)  for 

J  ‘IR 

4>  =  <))  and  given  by 


f  * 


Y  . 
JP 


E J(P,Y jp'41^)  4tt  {cj)(a))<Y  .  }  lriT7T  UCiW) 

JP  4>(w) 

represent  the  error  exponents  of  the  class;  for  p  =  .5  the  inequalities 


(45) 


A  A  A 

R.  <  E.(p,Y.  ,  0 ;  <p )  determine  the  cutoff  rate  region.  Notice  that  the 
J  J  J  P 

boundaries  of  both  regions  are  expressed  in  terms  of  the  Huber-Strassen 
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derivative  <f>  =  d$/dA  which  characterizes  the  worst-case  (least-favorable) 


channel. 

Proof :  The  sequence  of  steps  is  similar  to  that  for  the  proof  of  Theorem  3 
but  the  individual  steps  differ.  We  first  prove  the  inequalities  (40)  and 
(41).  To  prove  the  right-hand  side  inequality  in  (40)  we  first  prove  the 

A  A  /V  A  A  A 

inequalities  I  .(r  .,<J>;<|>)  i  I  .(r  .  ,<f>  ;<j>)  £  I  .(r  . , <f>  ;<j>)  for  all  input  spectral 
J  J  J  J  3  3 

densities  whioh  satisfy  (30)  and  j  =  1,2,3.  The  first  part  of  these 
inequalities  follows  from  the  definition  of  r^  [see  (33a)-(33b) ].  The 
second  part  of  the  inequalities  can  be  proved  by  considering  the  difference 

A  A  A 

I .  ( r  .,<<); <t>)  -  I  .  ( r  . ,  4> ; 4> )  ,  gather  the  logarithmic  terms  together ,  apply  the 
J  J  d  J 

-1 

inequality  lnx  i  1-x  and  show  that  the  above  difference  is  nonnegative. 
Then,  since  the  inequalities  above  are  valid  for  all  r^  satisfying  (30),  we 


can  apply  them  to  the  case  r.  =  r.  (related  to  Y  )  to  obtain  that 

J  J  J 

A  A  A  A  A  A 

I .  ( Y  . ,  <j>  ;4>)  ^  I  .  ( Y  . ,  4>  ;4>)  ^  I  .  ( T  . ,  <J> ;  cj> )  where  Y.  is  the  parameter  satisfying 
J  J  J  J  J  J  J 


(33b)  for  <{>  =  <!). 


To  prove  the  left-hand  side  inequality  in  (40)  we  use  Lemma  1,  a  second  lemma 
(Lemma  2  below)  and  the  following  fact  (See  eq .  (46)  of  [^  ]  and  justification' 
which  follows).  If  g(u)  >_  0  for  all  u  e  A,  u  e  B,  then 

/Agdf'  «  /Agd^  (45) 

for  Stay  spectral  measure  $  with  Lebesgue  decomposition  $  =  $"+  $"  where 
4>'<<X  and  4>'^i-X  (i.e.,  <&  singular  with  respect  to  X). 
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Lemma  2:  Let  g  be  a  continuous  decreasing  function  on  the  real  line,  let  X 
be  a  continuous  real  random  variable,  and  let  P  be  a  probability  measure  on 
the  o-algebra  generated  by  the  subsets  of  the  real  line.  Then,  the 
following  relationship  holds  for  all  a  and  b  with  a  <  b: 

f[a,b]  g(X)dP  =  -fb  p{x  ^  t}g'(t)dt  +  g(b)P|x  <  b}  -  g(a)p{x  <  a}. (46) 

J  a 


The  proof  of  Lemma  2  is  provided  in  [?  ]  (equations  (47a)-(47b))  and  will 


not  be  included  here, 
we 

i 

A 


Then^ prove  the  left-hand  side  inequalities  in  (40)  as  follows.  First,  we 
*  *  "for  U  <  V. 

notice  that  d$  =  <J)dX .  Then  we  define  g  .  (u)  =  (y . /u-1 )  Jy  .  |  j  =  l ,  2 ,3  and  apply  (45) 

1  3  J  A 

to  obtain  that  for  the  desired  inequality  to  hold  it  suffices  that 


peon'll  mm  9  Willi  Wtxi 


34 


fuo  i  8j(*,d4 


<  I  -  - 


gj  (<f>)d<J>,  j  =  1  ,2,  3 


(47) 


Since  g.  is  a  decreasing 
J 

A  A 

function  with  g.(Y.)=0  and  P{<t><0}=0  we  can  apply  Lemma  2  for  a=0,  b=Y_. 

J  J 

A  A 

twice,  once  for  P  =  <5  and  once  for  P=<J>,  and  then  use  the  fact  that  $  makes  <f> 
stochastically  smallest  over  all  $  in  (Lemma  1)  to  show  that  (47)  is 


satisfied. 

To  prove  the  right-hand  side  inequalities  in  (41)  we  follow  a  procedure 
similar  to  that  for  proving  the  right-hand  side  inequalities  in  (40).  To 
prove  the  left-hand  side  inequalities  in  (41)  we  define  the  functions 

h.(oi)  =  E.(p,Y.,4>;(l  -a)  (j>+a4))  for  a  in  [0,1]  and  j  =  1,2,3.  Then,  since  tu 
J  J  J 

are  convex  functions  of  a  the  desired  inequalities  which  can  be  written  as 


>  0.  After  we  evaluate  the 


h  (1)  5  h  (0)  become  equivalent  to  9h  (cx)/9a|a=(. 

j  j  J 

directional  derivative  3h  (a)/3a  at  a=0  and  apply  (45)  for  the 

a-ij> 

function  f  .  (u)=  p ( Y  /u-1 ) /< ( pu^ Y  ) [the  desired  inequalities  hold  if 
J  J  J 


‘♦<V 


Finally,  as 


functions  f. 

J 


for  the  proof  of  (47),  we  can  use  Lemma  2  for  the  decreasing 
Tfor  which  f.(Y)=0]  twice  and  Lemma  1  to  show  that  (48)  is 


satisfied  for  all  0  in 


35 


To  establish  the  positivity  of  I  ^  ,  4> ;  4> )  and  E  j  ( P ,  Yj ,  <p  ;<t>)  for  all 

cf*  =  d$/dA  with  0  in  and  p  in  [0,1]  we  use  thr  right-hand  side 
inequalities  in  (40)  and  (41),  respectively,  and  the  fact  that  both 
I  (T  ,♦;♦)  as  defined  by  (43)  and  E^.  ( p ,  Y^. ,  4> ;  <j> )  defined  by 


A  A  A  f 

E  (  p  ,  Y  ,  4>  ;<(>)  =  j 


uu>«.)  -”]»«*«)■ 

3  <P  (w) 


are  strictly  positive  for  j  =  1,2,3  because  of  their  definitions. 

To  complete  the  proof  of  Theorem  7  we  notice  that  because  of  (40) 


R.  <  I  .( Y  . ,  <J) ;  4> )  implies  that  R.  <  I  .( Y  .,  <t>  ;<j>)  for  all  <j>  =  dO/dA  with  0  in  $  . 
3  3  3  3  3  3  V 


Then  from  Theorem  5  applied  for  =  <J>  and  Y  =  Y  it  follows  that  for  the 

ensemble  of  pairs  of  random  block  codes  of  rates  (R  ,R  )  and  length  n 

described  there  the  average  probability  of  decoding  error  converges  to  zero 

exponentially  with  increasing  n.  Since  this  is  true  for  all  $  in  the  class 

under  consideration,  the  sufficiency  of  condition  (42)  is  established.  To 

vik.  iltuaf  ccvpacii'j  4 UeontifY]  4°»'  I>H*N/V( 

prove  its  necessity,  notice  that  according  to  the  converse  of\^  '  ‘  "  — — 

the  violation  of  any  of  the  coditions  (42)  implies  that  the  average 

probability  of  decoding  error  converges  to  1  exponentially  for  the  channel 


determined  by  <J),  which  is  a  member  of  the  aforementioned  class. 

The  discussion  for  the  choice  of  the  operating  point  is  similar  to  that 
which  followed  the  proof  of  Theorem  3  and  we  do  not  repeat  it  here. 

The  corr esponding  result  for  two-user  tree  codes  is: 

Theorem  8  :  Under  the  assumptions  of  Theorem  4a  condition  (42)  guarantees 
that  for  the  ensemble  of  pairs  of  random  tree  codes  of  constraint  length  K 
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and  rates  (R^.R  )  (described  in  Theorem  6  as  applied  for  <J>  =  <j>  and  Y  =  Y) 

the  average  probability  of  decoding  error  converges  to  zero  exponentially 
with  increasing  K  for  all  channels  in  the  class. 

For  the  proof  of  this  Theorem  one  can  follow  the  same  steps  as  for  the 
proof  of  Theorem  4  and  use  the  various  definitions  involved  in  Theorems  5  , 
6  ,  and  7  .  The  proof  is  therefore  omitted. 

It  should  be  noted  that  all  the  results  of  this  section  can  be  extended 
to  continuous-time  stationary  additive  Gaussian  bandlimited  (e.g.,  with 
spectral  densities  defined  on  Q  =  [ ] )  channels.  Since  Huber-Strassen 

derivatives  of  capacities  with  respect  to  a-finite  (and  not  finite)  measures 
can  be  constructed  [20],  these  results  can  possibly  be  extended  to 
nonbandl imited  [e.g.,  Q  =  (-",°°)]  channels  provided  that  the  definition  of 
it  is  appropriately  modified.  However,  several  of  the  most  useful  examples 

of  capacity  classes  (e.g.,  the  e-mixtures  and  variation  neighborhoods)  are 


not  capacities  when  Q  is  not  compact. 
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IV.  SUMMARY  AND  CONCLUSIONS 

We  have  addressed  the  problem  of  minimax  robust  coding  for  multiple- 
access  channels  with  uncertainty  in  their  statistical  description.  For 
uncertainty  classes  determined  by  Choquet  2-alternating  capacities  coding 
theorems  were  proved  for  discrete  memoryless  channels  with  uncertainty  in 
the  probability  transition  matrices,  and  for  stationary  additive  Gaussian 
channels  with  spectral  uncertainty.  It  was  established  that  for  the 
ensembles  of  pairs  of  random  block  codes  and  random  tree  codes  the  average 
error  probability  of  the  decoder  converges  to  zero  exponentially  with 
increasing  block  length  or  constraint  length,  respectively,  for  all  two-user 
channels  in  the  class,  provided  that  the  decoder  employs  a  suitable  robust 
maximum-likelihood  decodidng  rule  and  the  code  rates  lie  inside  a  critical 
region.  The  channel  capacity  region  and  the  cut-off  rate  region  for  the 
class  of  channels  were  evaluated.  The  boundaries  of  these  regions  ,  a  well 
as  the  aforementioned  robust  maximum-likelihood  decoding  rule  are 
characterized  in  terms  of  a  Radon-N ikodym  type  derivative  between  the  upper 
measure  of  the  Choquet  capacity  class  and  a  Lebesgue-like  measure  defined  on 


the  appropriate  set. 
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